---
author: hongbo
date: "2019-01-11"
previewImg:
title: In search of lightning feedback loop in a large codebase
description: |
---

## Feedback loops

As software developers, when we touch a codebase, we want the edit-compile
cycle, to be as short as possible. Studies show that feedback loop less than
1s, or 100ms will prevent devleopers getting distracted.

This seems to be possible in a small project, but as project grows to 10k
files, or even more, 100k files in a large company like Google, Facebook, this
seems to be extremely challenging.

In this article, we are talking about how BuckleScript - it supports both OCaml
and ReasonML syntax - is trying to solve this issue.

The edit-build cycle in BuckleScript consists of two components: the compiler
which does type checking and code generation and the scheduler which figures
out what to rebuild and how to do it concurrently.

## The importance of compiler's cold performance

In order to reduce the edit-build latency, some languages adopt an approach of
in memory compiler + watch mode. We think this is not a scalable or reliable
approach.

A compiler is a complex piece of software, the chance that a compiler has
memory leak is not low. It is not observed in real world since compiler is used
mostly in a short-lived setting, it starts up fast and dies off quickly.

However, this is decimated when compiler is put in server mode. When we have
10k or 100k files held in memory, it is very easy to observe OOM (out of memory
issues).

We figured that to deliver a scalable and reliable system, it is better to
decouple the compiler's complexity from the scheduler. When the compiler cold
starts and dies off quickly, the operating system process mechanism serves as
an obviously correct garbage collector, which increases the reliability of the
whole system.

To reduce such latency in a single compiler's workload, we spent lots of time
tweaking the performance of the compiler itself, for example, rewriting the hot
path in C code, most BuckleScript compiler source code is written in an
imperative C-style to avoid allocation.

To have a general idea of how fast BuckleScript's compiler runs:

```
test>cat fib.ml
let rec fib = function
  | 0 | 1 -> 1
  | n -> fib (n - 1) + fib (n - 2)
```

```
test>time /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-cmi -bs-cmj -c fib.ml

real    0m0.008s
user    0m0.004s
sys     0m0.003s
```

Having a compiler run fast with a cold start lays the groundwork so that it
won't be the bottleneck in the whole process. This is because when the
architecture decouple the compiler from the scheduler, the compiler will be
invoked by the scheduler hunderds of thousands of times in a build cycle, the
latency of the compiler in a single compilation unit will add up.

## The art of being incremental

Having a compiler running fast in cold mode does not solve the scalability
issue alone. Take a code base which contains 100K source files for example,
100ms per file would result in 10,000s latency which is not acceptable. To
solve such issue, we need reduce the workload as much as possible during each
edit-build cycle.

In a statically typed language, suppose we have two compilation units A , B,
and B depends on A. Whenever A changes, B gets recompiled, to make it worse,
the recompilation propogates, all dependencies of B get recompiled, which will
result in a snowball effect. In this model, whenever we touch the non-leaf
compilation unit, the latency would be large.

The key observation here for BuckleScript is that B does not really depend on
the last modified time of A, it depends on the intermediate output of A (e.g,
`.cmj` and `.cmi` file) which may not change even if we modify A. What we work
on here is to reduce the probability of changing A's intermediate output due to
changing A. With the integration of a scheduler, it can help stop the
propagation as early as possible.

In BuckleScript, each compilation unit is composed of two files, the
implementation file and interface file which are compiled to intermediate
output .cmj and .cmi separately.

The interface builds are completely separate from implemetation, it does not
depend on .cmj (implementation intermediate output) at all.

Suppose the interface is not changed, whenever we touch the implementation
file, BuckleScript designs the data structure of `.cmj` in a way that the
content of `.cmj` is seldom changed. (Let's say its probability is 0.05 which
is rare in cases when the arity of a function changed)

If neither `.cmj` nor `.cmi` gets changed, the scheduler would stop
propagation.

Suppose the `.cmj` file is still changed (P = 0.05), A's dependency B would get
recompiled. Note `B.cmi` only depends on `A.cmi`, so it will not get compiled,
only `B.cmj` will get recompiled, its chances that `B.cmj` gets changed will be
even lower. In practice, the probabiity of the length of propagation chain is
more than two is less than

```
0.05 * 0.05 = 0.0025
```

This means when we are in an edit-build cycle, whenever A gets changed, it may
have many direct dependencies, but the longest rebuild sequence will get
settled in at most two compilation units. Suppose the scheduler schedule the
tasks in parallel, the longest sequence is bound by two, which means the
rebuild cycle is very close to compiling two compilation units.

In implementation, we also generalized the idea of stopping propogation of .cmi
changes, so whenever we are adding some comments in the interface file, it will
get settled quickly.

The worst case is that the root of an interface changed. In such case, it does
not mean all its dependencies will get recompiled, it depends on how many
dependencies' interface depend on the root's interface. The cool thing is that
since interface's dependency chain is completely decoupled from
implementation's dependency, actually it used to be a subset of
implementation's dependency chain, so only a subset of its dependencies will
get compiled. We will see a concrete an example later.

## A fast scheduler

As we said, the time spent in an edit-build cycle is mostly composed of two
parts: compiling invoking the compiler, scheduling.

We are reusing the very fast scheduler provided by
[Ninja](https://ninja-build.org/manual.html)

> Where other build systems are high-level languages, Ninja aims to be an
> assembler.

BuckleScript outputs assembler style instructions consumed by Ninja which does
scheduling very fast and provides good parallelism. By integrating with Ninja's
`restat` attribute, we are able to implement the idea of a specialized content
based build system.

It is fast enough for 99% of use cases. For a project less than 1k files, the
time spent in an edit-build cycle is dominated by the compiler, but as project
grows to 10K files or even more, the time spent in compiler is stable and
bounded due to our incremental design, however, the time will start be
dominated by the scheduler.

For a nop build around 10k files, it takes around 700ms for Ninja to figure out
nothing needs to be rebuilt.

The current Ninja model is simple, every time it is invoked, it will re-read
build.ninja instructions, check stats of artifacts and do the scheduling.

Instead of making a long-lived compiler, we propose to have a long-lived
scheduler. The complexity of a scheduler is significantly lower than a
compiler. Having an in-memory scheduler will help reducing redundant work such
as parsing build.ninja instructions which is around of size 2M for 10k files.
With the integration of watch mode, it does not need stat all artifacts each
time, this should help increase the scalability of scheduler to 100K files or
even more.

## Tests on a synthetic benchmark

Our synthetic bench is borrowed from
[OMake](http://blog.camlcity.org/blog/omake1.html) and public available
[here](https://github.com/bucklescript/bsb-bench).

> The benchmark has these characteristics: The task is to build n^2 libraries
> with n^2 modules each (for a given small number n), and the dependencies
> between the modules are created in a way so that we can stress both the
> dependency analyzer of the build utility and the ability to run commands in
> parallel.

We modified the benchmark to add interface file for each implementation file.

The benchmark is running on MacBook Pro 18 with CPU 2.6 GHz Intel Core i7,
Memory 32 GB 2400 MHz DDR4

The test is running against `n = 3,5,7,9`, where the source code size would be
`2*3^4=162`, `2*5^4=1250`, `2*7^4=4802`, `2*9**4=13122`.

Below is what we get for a cold build from scratch:

| Source size | Clean build (ms) |
| ----------- | ---------------- |
| 0162        | 684              |
| 1250        | 5,100            |
| 4802        | 24,112           |
| 13122       | 125,248          |

| Source size | Nop build (ms) | Touching root module (m_1_1_1_1.ml) | Touching (m_1_1_1.mli) |
| ----------- | -------------- | ----------------------------------- | ---------------------- |
| 0162        | 16             | 59                                  | 54                     |
| 1250        | 79             | 120                                 | 133                    |
| 4802        | 266            | 369                                 | 367                    |
| 13122       | 728            | 963                                 | 962                    |

We can see from the table as size grows, the time spent in edit-build cycle shifts from compilation to the scheduler, this is due to the fact that Ninja scheduler does not save the work for each test.

| Source size | Adding value to root module (m_1_1_1_1.ml) | Changing root interface |
| ----------- | ------------------------------------------ | ----------------------- |
| 0162        | 56                                         | 70                      |
| 1250        | 131                                        | 155                     |
| 4802        | 370                                        | 428                     |
| 13122       | 969                                        | 991                     |

The verbose build log for adding values on source of size `13122` is as below:

```sh
test9>ninja -C lib/bs -v
ninja: Entering directory `lib/bs'
[1/2] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe    -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -c -o src/dir_1_1/m_1_1_1_1.mlast -bs-syntax-only -bs-binary-ast /Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.ml
[2/2] /usr/local/lib/node_modules/bs-platform/lib/bsb_helper.exe  -g 0 -MD src/dir_1_1/m_1_1_1_1.mlast
[1/5905] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I src/dir_4_6 -I src/dir_4_1 -I src/dir_4_8 -I src/dir_6_5 -I src/dir_6_2 -I src/dir_8_3 -I src/dir_8_4 -I src/dir_2_3 -I src/dir_2_4 -I src/dir_6_3 -I src/dir_4_9 -I src/dir_6_4 -I src/dir_4_7 -I src/dir_2_5 -I src/dir_2_2 -I src/dir_8_5 -I src/dir_8_2 -I src/dir_9_9 -I src/dir_3_7 -I src/dir_1_3 -I src/dir_9_7 -I src/dir_3_9 -I src/dir_1_4 -I src/dir_7_6 -I src/dir_7_1 -I src/dir_5_5 -I src/dir_7_8 -I src/dir_5_2 -I src/dir_3_8 -I src/dir_9_6 -I src/dir_1_5 -I src/dir_9_1 -I src/dir_1_2 -I src/dir_3_6 -I src/dir_9_8 -I src/dir_3_1 -I src/dir_5_3 -I src/dir_5_4 -I src/dir_7_9 -I src/dir_7_7 -I src/dir_8_7 -I src/dir_2_9 -I src/dir_8_9 -I src/dir_2_7 -I src/dir_4_2 -I src/dir_6_8 -I src/dir_4_5 -I src/dir_6_1 -I src/dir_6_6 -I src/dir_2_1 -I src/dir_2_6 -I src/dir_8_8 -I src/dir_8_1 -I src/dir_2_8 -I src/dir_8_6 -I src/dir_6_7 -I src/dir_6_9 -I src/dir_4_4 -I src/dir_4_3 -I src/dir_7_2 -I src/dir_7_5 -I src/dir_5_8 -I src/dir_5_1 -I src/dir_5_6 -I src/dir_1_9 -I src/dir_3_4 -I src/dir_3_3 -I src/dir_9_4 -I src/dir_1_7 -I src/dir_9_3 -I src/dir_5_7 -I src/dir_7_4 -I src/dir_5_9 -I src/dir_7_3 -I src/dir_9_2 -I src/dir_1_1 -I src/dir_9_5 -I src/dir_1_6 -I src/dir_3_2 -I src/dir_1_8 -I src/dir_3_5  -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_1_1.cmj -c  src/dir_1_1/m_1_1_1_1.mlast
File "/Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.ml", line 4, characters 4-5:
Warning 32: unused value a.
```

Ignoring the preprocess stage, We can see there are 5906 jobs scheduled but
only one job is processed. This is because adding a single value does not
change `m_1_1_1_1.cmj` so that the change of propogation is stopped
immediately.

The result is surprisingly good even if change the interface of root files, by
looking at the verbose build log:

```sh
test9>ninja -C lib/bs -v
ninja: Entering directory `lib/bs'
[1/2] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe    -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -c -o src/dir_1_1/m_1_1_1_1.mliast -bs-syntax-only -bs-binary-ast /Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.mli
[2/2] /usr/local/lib/node_modules/bs-platform/lib/bsb_helper.exe  -g 0 -MD src/dir_1_1/m_1_1_1_1.mliast
[1/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-no-builtin-ppx-mli -bs-no-implicit-include  -I src/dir_4_6 -I src/dir_4_1 -I src/dir_4_8 -I src/dir_6_5 -I src/dir_6_2 -I src/dir_8_3 -I src/dir_8_4 -I src/dir_2_3 -I src/dir_2_4 -I src/dir_6_3 -I src/dir_4_9 -I src/dir_6_4 -I src/dir_4_7 -I src/dir_2_5 -I src/dir_2_2 -I src/dir_8_5 -I src/dir_8_2 -I src/dir_9_9 -I src/dir_3_7 -I src/dir_1_3 -I src/dir_9_7 -I src/dir_3_9 -I src/dir_1_4 -I src/dir_7_6 -I src/dir_7_1 -I src/dir_5_5 -I src/dir_7_8 -I src/dir_5_2 -I src/dir_3_8 -I src/dir_9_6 -I src/dir_1_5 -I src/dir_9_1 -I src/dir_1_2 -I src/dir_3_6 -I src/dir_9_8 -I src/dir_3_1 -I src/dir_5_3 -I src/dir_5_4 -I src/dir_7_9 -I src/dir_7_7 -I src/dir_8_7 -I src/dir_2_9 -I src/dir_8_9 -I src/dir_2_7 -I src/dir_4_2 -I src/dir_6_8 -I src/dir_4_5 -I src/dir_6_1 -I src/dir_6_6 -I src/dir_2_1 -I src/dir_2_6 -I src/dir_8_8 -I src/dir_8_1 -I src/dir_2_8 -I src/dir_8_6 -I src/dir_6_7 -I src/dir_6_9 -I src/dir_4_4 -I src/dir_4_3 -I src/dir_7_2 -I src/dir_7_5 -I src/dir_5_8 -I src/dir_5_1 -I src/dir_5_6 -I src/dir_1_9 -I src/dir_3_4 -I src/dir_3_3 -I src/dir_9_4 -I src/dir_1_7 -I src/dir_9_3 -I src/dir_5_7 -I src/dir_7_4 -I src/dir_5_9 -I src/dir_7_3 -I src/dir_9_2 -I src/dir_1_1 -I src/dir_9_5 -I src/dir_1_6 -I src/dir_3_2 -I src/dir_1_8 -I src/dir_3_5  -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_1_1.cmi -c  src/dir_1_1/m_1_1_1_1.mliast
[2/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ...  -color always -o src/dir_1_1/m_1_1_1_1.cmj -c  src/dir_1_1/m_1_1_1_1.mlast
[3/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ...  -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_2_2.cmj -c  src/dir_1_1/m_1_1_2_2.mlast
[4/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_1.cmj -c  src/dir_1_1/m_1_1_2_1.mlast
[5/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_3.cmj -c  src/dir_1_1/m_1_1_2_3.mlast
[6/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_9.cmj -c  src/dir_1_1/m_1_1_2_9.mlast
[7/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_4.cmj -c  src/dir_1_1/m_1_1_2_4.mlast
[8/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_5.cmj -c  src/dir_1_1/m_1_1_2_5.mlast
[9/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_6.cmj -c  src/dir_1_1/m_1_1_2_6.mlast
[10/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_7.cmj -c  src/dir_1_1/m_1_1_2_7.mlast
[11/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test  -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include   -I ... -color always -o src/dir_1_1/m_1_1_2_8.cmj -c  src/dir_1_1/m_1_1_2_8.mlast
```

The log means that there are 5906 jobs to be scheduled but only 11 jobs
processed. This is because there only 9 direct dependencies, the cmj/cmi
changes' propogation stops in the first level of dependency, indirect
dependencies are not propagated any more since the intermediate representation
stays stable after that. There are 9 more jobs to do compared with changing
implementation but the latency only increased by 30ms, which means the
scheduler does a good job in parallelism.

The real world scenario may be a bit worse than our synthetic benchmark, since
our syntethic benchmark only impose dependencies across implementation files,
the dependencies across interface files are flat.

To conclude, currently BuckleScript scales very well to projects with around
10k files, and with some work in improving the scheduler, we believe it is not
too difficult to scale it to 100K files or even more. To reach such enormous
scalability in a reliable way, we propose to have a long-lived scheduler and
fast cold compiler. The language interface design and the design of the data
structure for the implementation will make the longest rebuild propogation
chain bounded by two in most cases.
